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1. INTRODUCTION 



A. BACKGROUND 

The techniques for on line identification have reached a great degree of 
sophistication in the recent years. Although many types of model structures could be 
used to describe the dynamical behavior of deterministic systems, the major emphasis is 
still on linear models. The whole problem is to try to determine which model best 
describes the dynamics under observation, in the neighborhood of the desired operating 
condition. 

In some cases, insight of the model may be obtained from the laws of Physics, 
Chemistry', or similar. In most cases however, this might not be feasible due to the 
complexity of the physical factors involved. In these instances it may be possible to 
deduce the values of the parameters by observing the nature of the system's response 
under appropriate experimental conditions. We shall call this procedure parameter 
estimation. 

The issue of parameter estimation plays an essential role in adaptive filtering, 
prediction, and control. The steps involved in a parameter estimation problem are the 
following: 

1. Select an appropriate class of models. 

2. Determine a "cost function", i.e., criteria bv which the relative performance of 
different models within the class will be judg'ed; 

3. Choose the proper test signals which reflect all range of desired operating 
conditions. In particular for engineering applications' we taik about signals 
"sufficiently rich" in frequencv, “which span the whole frequency rang“e of 
interest. 

4. Select an appropriate estimation algorithm according to on line or off line 
requirements. In the on line case, where the data are “processed sequentially to 
vield an estimate in real time, the computational complexity is a fundamental 
factor in the choice of the estimation algorithm. 

5. Make use of prior knowledge available on the svstem. It is ahvavs 
advantageous to incorporate as much prior knowledge' as possible into the 
estimation algorithm, in terms of initial conditions and possible constraints in 
the estimated parameters. 



For the particular case of estimating the parameters of linear models, we search within 
the class of models with a given fixed order for the one which minimizes a prediction 
error criterion. Test inputs are usually given in the form of white noise or sum of 
sinusoids within the frequency range of interest. 
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For what the choice of the estimation algorithm is concerned, extensive 
simulations show that techniques based on recursive least-squares present a much 
faster convergence rate than other algorithms, such as projection algorithm (Goodwin 
and Sin) [Ref. 1: p. 62]. The .reason beyond this is the fact that the cost function 
associated with the former includes all past data (and so it has memory) while the 
latter is memoryless. The result is an improved convergence rate of the estimated 
parameters, an improved tracking for systems with time varying parameters, and better 
robustness in the presence of noise [Ref. 2: pp. 1-10] and unmodelled high frequency 
dynamics at the expenses of the need for more computing power. 

The origins of the least-squares method can be traced back to Gauss [Ref. 3: p. 
63]. In its recursive version it has been formulated independently by several authors. 
The original reference seems to be Plakett [Ref. 4: p. 149]. An early treatment of the 
least-squares method applied to dynamic system identification can be found in Astrom 
[Ref. 5: pp. 123-130] and many related papers dealing with different aspects of the 
least-squares method can be found in the literature. A comprehensive treatment of 
adaptive techniques for identification is given in Ljung [Ref. 3: p. 17-20]. 

The major drawback of recursive least-squares identification is its cost in terms of 
computational complexity, which might make it unsuitable for real time identification 
of a large number of parameters. In fact it requires on line matrix manipulations, 
whose size grows with the dynamical complexity of the system to be estimated. The 
situation becomes even worse when on line identifications techniques are applied to 
multivariable systems, where the number of unknown parameters grow also with the 
number of inputs and outputs. In this respect the limitations due to serial 
computation is evident. From this list of implementational problems, the need for 
adequate computing capabilities is a crucial point for the implementation of on-line 
estimation techniques. 

In this research we analyze an on line recursive least-squares identification 
algorithm which processes sequential "blocks" of data. In particular the parameter 
estimates are determined by the least-squares solution of a redundant number of linear 
equations obtained from the measured data. Existing techniques of solutions for this 
class of equations based on QR decomposition are particularly attractive for parallel 
processing applications. In the next section we introduce the concept of QR 
decomposition and its significance for parallel computation. 
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B. QR DECOMPOSITION 

A numerically attractive method for the least-squares solution of linear equations 
is given by the QR decomposition of a matrix. To illustrate the concept consider the 
system of linear equations 



A 0 = 5 



( 1 . 1 ) 



in the unknown 0. with 0 an n-dimensional vector and 5 m-dimensional, where m>n 
(more equations than unknowns). The equal sign in (1.1) is intended to be in a least- 
squares sense. That is to say that given A, B the vector 0 in (1.1) is the one which 
minimizes the square error | A0 — B |. By using the QR decomposition [Ref. 6: pp. 
143-144] we can always decompose a given m x n matrix A as 

A = QR 



T 

with Q an m * m orthogonal matrix, i.e., QQ 1 
appropriate dimensions, and R of the form 



R = 



R 

O 



I, I being the identity matrix of 



with R n x n upper triangular. It can be shown that any solution of the equation (1.1) 
in the least squares sense is also solution of the system of n equations and n unknowns 



R 0 = B { (1.2) 

with obtained from Q and B. Also if the matrix A is full rank, the matrix R is 
invertible and the unique solution 0 can be computed from (1.2). 

The important points are the facts that equation (1.2) does not require explicit 
matrix inversion and can be solved by successive substitution, and a large number of 
techniques can be used to arrive to equation (1.2). In particular the Givens Rotation 
without square root [Ref. 7: pp. 215-218] to be presented in the subsequent chapters is 
attractive for systolic arrays implementation. 
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C. SYSTOLIC ARRAY 

An efficient way of handling such procedure in QR decomposition is by using 
parallel processing structures, such as systolic arrays or wavefront arrays. The idea of • 
a systolic array was developed by Kung and Leiserson. [Ref. 8: pp. 256^28 1] It consists 
of an array of individual processing cells that are arranged in the form of a regular 
structure. Each cell in the array is provided with local memory of its own and each cell 
is connected only to its nearest neighbors. The array is designed such that wavefronts 
of data are clocked throughout in a highly rhythmic fashion, much like the pumping 
action of a human heart; hence, the name "Systolic”. These systolic arrays enjoy 
simple and regular communication paths, and almost all processors used in the 
networks are identical. As a result, special purpose hardware based on systolic arrays 
can be built inexpensively using VLSI technology. What is interesting in high- 
performance parallel structures is that it can be implemented directly as a dedicated 
hardware device. 

In this research, the recursive algorithm, based on recursive least-squares with 
covariance resetting, is redesigned in order to make it suitable for implementation on a 
VLSI chip and also modify the systolic array in order to reduce computing time and 
complexity. The difference in adaptive control is that the estimator has to operate 
recursively on subsequent blocks of data, so that the estimated parameters converge 
asymptotically to the respective correct values. Asymptotic convergence is guaranteed 
by initializing each estimation with the parameter values from the previous one, and by 
persistency of excitation of the external inputs. 

This research report is divided as follows; Chapter II discusses on line estimation 
and recursive least-squares, while Chapter III analyzes the parallel algorithm using QR 
decomposition and systolic array. In Chapter IV. shows the simulation results of 
parallel algorithm, comparison and conclusion is discussed in Chapter V. 
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II. RECURSIVE LEAST-SQUARES ALGORITHM 



A. LINEAR MODELS FOR DISCRETE TIME SYSTEMS 

Consider a dynamical system with single input signal u(t) and single output signal 
v(t). In a sampled data framework these are numerical sequences obtained from 
sampling continuous time signals at a frequency W $ . Without loss of generality 
assume the sampling interval T to be one time unit, and let the time index be t = 

1,2,3, A linear discrete time model of the discrete time system is given by the 

difference equation 



y(t). + ajy(M) + ... + a n y(t-n) = bju(t-l) + ... + b m u(t-m)+ v(t) (2.1) 

where v(t) denotes a disturbance term to be specified. We shall use operator notation 
for conveniently writing difference equations. Thus let q'* be the backward shift or 
time delay operator 



q* 1 x(t) = x(t-I) 

Then (2.1) can be written as 



( 2 . 2 ) 



A(q‘ 1 )y(t) = B(q’ 1 )u(t) + v(t) (2.3) 

where A(q*^) and B(q*^) are polynomials in the delay operator: 

A(q _1 ) = 1 + ajq' 1 + + a n q' n 

B(q‘ ! ) = bjq' 1 + b 2 q' 2 + ... + b m q’ m 

The model (2.1) or (2.3) describes the dynamic relations between the input and the 
output signals. In particular the polynomials A.B model the linear part, while the 
disturbance term v models disturbances(at the input and output), nonlinearities, and 
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anything which is not accounted by the linear model. An alternative way to represent 
the linear model is by defining the parameter vector. 



0 (aj,...., a n ,b|,....,b m ) 

the regression vector oflagged input-output data, 



<P T (t) = [ -y(t-i), .... -y(t-n), u(t- 1 ),..., u(t-m) ] 



(2.4) 



and write (2.3) as 



y(t) = 0 T <p(t) + v(t) 



(2.5) 



Beyond the clear relation between (2.3) and (2.5), the significance of the regression 
model is two folded: 

a) it highlights the linear relationship between the observed signals y(t), (p(t) and the 
parameters 0 of the model, and 

b) if we assume v(t) to be a sequence of n uncorrelated random variables (such as 
white noise) the predicted value of y(t) given the measurements can be easily 
expressed as 



In this research we are interested in the problem of estimating the parameter 0 from 
measurements of the input, output sequences. 

B. ON LINE PARAMETER ESTIMATION 

It is possible to estimate the values of the model parameters 0 by observing the 
nature of the system's response under appropriate experimental conditions. The idea 
for the recursive identification problem is to compare the output with that of an 
adjustable model, and update the parameters until the error between the outputs of 
plant and model is minimized, in some sense. The procedure is schematically described 
in Figure 2.1. 



y(t) = 0 T q>(t) 



( 2 . 6 ) 
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- Let us consider the difference equation model in (2.5). We desire to estimate the 
parameter vector 0 from a set of measured data y(t), <p(t); t = 1,2. ..., N. In a least- 
squares framework, we choose this estimate by best fitting the linear model to the 
available data. That is, we define a criterion function 



V N (0) = « t [ y(t) - 0 T <p(t) ] 2 (2.7) 

to be minimized with respect to 0. 

The inclusion of the coefficient a t in the criterion (2.7) allows us to give different 
weights to different observations. In applications, most often is either identically 
equal to 1 or expresses a forgetting factor of the form in order to give a higher 
weight to more recent data. (Ref. 3: p. 17] 

A T> 

If we denote by y ( 1 1 0 ) = 0 1 <p(t) as the prediction of y(t), given the parameter 
vector 0, the criterion (2.7) can be seen as an attempt to choose a model that produces 
the best prediction for the output signal in the least-squares sense. The cost function 
can be minimized analytically with respect to 0, to obtain the estimate 
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( 2 . 8 ) 



0(N) - ( t f 

which is defined, provided the inverse of the matrix involved exists. To obtain a 
recursive solution (2.8) denote 

R(t) = J_a t (cp(k)<p T (k)) 

K 1 

and, from (2.8), write 

^ « t <p(k)y( k ) = R(t — 1) 0(t — 1). 

K. I 

From the definition of R(t) it follows that 
R(t 1) = R(t) - a t <p(t)(p T (t) 
hence standard algebraic manipulation lead to the recursion 



0(t) = R _1 (t) ct k <p(k)y(k) + a t <p(t)y(t) ] 

- 0(t-l) + R* 1 (t)(p(t)a t [ y(t) - 0 T (t-l)<p(t)] 



(2.9) 



The algorithm in (2.5) is not well suited for computation as it stands, since a matrix 
has to be inverted in each time step. It is more convenient to introduce the matrix 

PM - R -1 M 

and update P(t) directly on the basis of the matrix inversion lemma [Ref. 3: p. 19] This 
leads to the recursion 



P(t- l)<p(t)<p T (t)P(t- 1) 
P(t) = P(t-l) - — 

l/a t + (p T (t)P(t- l)tp(t) 



( 2 . 10 ) 



The advantage using of (2.10) is that the inversion of a square matrix of size equal to 
dim 0 is replaced by the scalar division. Thus the least- squares estimate 0(t) defined by 
(2.8) can be recursively calculated as 
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P(t — l)tp(t) 



0(t) = 0(t- 1) + 



P(t) = P(t-l) - 



(y(t) -e T (t- l>p(t) 1 



l/«t + <p T (t)p(t-l)<p(t) 
P(t- l)<p(t)<p T (t)p(t- 1) 
l/« t + <p T (t)p(t — l)(p(t) 



( 2 . 11 ) 



The problem with this algorithm is that the matrix R(t) might not be invertible 
and therefore P(t) might not exist. This is the case when t ^ dim <p, since in this case 
we can write 



<p T (l) 



R(t) = [<p(D, <p(t) ] 



= 0> t <D 



T 

t 



<p T (t) 

T 

when has more columns than rows(as in the case of t < dim (p) we can always 
find a non zero vector X of appropriate dimension such that 



(P t T X = 0 

which shows R(t) to be a singular matrix whenever t ^ dim <p. This problem of 
existance is addressed in the next section, where the recursive least-squares algorithm is 
modified to accommodate initial conditions and block processing techniques. 



C. RECURSIVE LEAST-SQUARES ALGORITHM 

The algorithm (2.11) can still be applied if we modify the cost function (2.7) as 



V N (0) = -4-5/ y(t) -<p T (t)0) 2 + 4-10 -0 A (O))P(O)' l (0- 0(0)) ( 2 . 12 ) 

- N 2 t=l 2 

where P(0) and 0(0) are the initial conditions of P and 0 respectively. Basically, the 
cost in (2.12) represents the sum of squares of errors e(t) = v(t) - <p 1 (t)0, which is 
the difference between the actual observation y(t) and the value predicted by the model 
with parameter vector 0. The second term on the right hand side of (2.12) has been 
included to account for the initial parameter estimate. Relevant observations for the 
recursive least-squares are given in the remainder of the section. 
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Observation 1: The recursive algorithm requires initial values for terms 0 and P 
as 0(0) and P(0) in order to start the recursion at t = 0. The relative importance of the 
initial values on the recursive estimate decays with time, since the leftmost sum in 
(2.12) becomes preponderant with N encreasing. At the same time the entries of the 
matrix P become small with t -+ °o for any initial conditions P(0). 

The initial value P(0) of the matrix P appears in the cost function (2.12) as the 
weight given to the initial estimate 0(0), and it has to be positive definite. For 
convenience it is often chosen as P(0) = OI, where C is some positive constant. In 
particular P(0) expresses the "confidence" we have on the initial conditions 0(0), and it 
is set to have large entries if we have poor confidence on the initial conditions 0(0). 
This can be easily seen in the cost function (2.12) where a very large P(0) would give a 
very small weight to the term 0 — 0(0) associated with the initial conditions. 

Observation 2: The least-squares algorithm generally has much faster 

convergence than other algorithms, such as the projection algorithm, and it can be 
used essentially unaltered with noisy signals [Ref. 1: P. 62]. We now discuss a number 
of variants of the least-squares algorithm. A problem of the recursive least-squares 
algorithms defined in (2.11) is that in general the matrix P(t) tends to zero as t co. 
This can be seen from the definition of P(t) in (2.11) and the matrix inversion lemma 
which yields 

P(t)' 1 = P(t-l)' 1 + cp(t)(p T (t) 

in the scalar case <p(t)<p (t) is a non-negative quantity, and therefore the sequence 
P(t)** is nondecreasing and might tend to infinity (i.e..P(t) might go to zero). The 
same result holds in the non-scalar case shown in (Goodwin and Sin) [Ref. 1: pp. 
54-58]. The consequence of P(t) — *■ 0 is that the algorithm becomes less sensitive as t 
— ► co. There are several ways to cope with this problem. We can either discount past 
data with a forgetting factor, or periodically reset the covariance matrix to nonzero 
constant values. The effect of these two modifications is the ability to apply the 
estimation algorithm to systems with time varying dynamics. 

Although the algorithm presented and related analysis assumes a SI SO plant for 
simplicity, they can be extended directly to the multivariable case (MIMO). 
Furthermore the plant is assumed to be causal, having unknown parameters and 
known order. The problem of order determination and validation involves criteria 
which go beyond the scope of this research. Therefore it will be always assumed that 
the order of the chosen model is a parameter given to the designer. ’ 
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D. RECURSIVE LEAST-SQUARES WITH COVARIANCE RESETTING 

The approach in this research is based on recursive least-squares with periodic- 
covariance resetting, and block processing. By block processing approach we mean 
that the time set Z_j_ is divided into segments Ij, of fixed length N as 

oo 

z_ = u I k 
+ k= 1 k 

where 

I k = [teZ + : (k-l)N < t < kN ] 

and the parameter vector 0(the coefficients of A and B) in (2.3) are kept constant 
within each interval I k . 

The serial implementation call for the estimation 0(t) by the recursive least-squares 
shown in equation (2.11). 



[ y(t) -0 T (t-l)<p(t)] 

(2.13) 



and since the adaptive gains are updated at t = kN only (once for every block), only 
the estimates 0^ are considered. 

There are several reasons which justify the block processing approach. For 
example one application of on line estimation is found in adaptive control. 

As shown in Figure 2.2, in this class of problems, the output of the on line identifier is 
used to set the values of the parameters of a linear compensator. In an adaptive 
control context it has been shown that an external command u(t) in (2.1) sufficiently 
rich in frequency, together with blocks 1^ of sufficient length N provide a guarantee for 
a consistent estimation as 

0(t) -» 0 (0 representing actual parameters). 



0(0 = 0(t-D + 



p(t) = p(t-i) - 



p(t- D<p(t) 



i + <p T (t)P(t-i)<p(t) 
pit - i)(p(U(p T (t)P(t- 1) 



1 + 9 i (t)P(t“ 1)<P(0 
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Figure 2.2 Adaptive Control. 

As mentioned in the previous section the parameter estimation we consider is 
such that the covariance matrix P f — » 0 and the estimator in (2.13) looses sensitivity to 
variation of 0 as t -* oo (Ref. 1: p. 65]. Although the covariance matrix P t can be 
reset at any time, for convenience of analysis we assume that it is reset at tne beginning 
of each block as 



P ''kN-l “ ’’o 2 I (2.W) 

with I the identity matrix and <j 0 an arbitrary positive constant. The result discussed 
above are formalized in the following theorem, proved in [Ref. 1: pp. 60-61]. 

Theorem 

Let a SISO discrete time linear system be modeled by the difference equation 
(2.1) with v(t) = 0. Also let 

1. n= degree A(q'*) = degree B(q'*). 

2. A(0) = 1, B(0) = 0 (i.e. the system is causal). 

3. let 0 = [A,Bj^, A, B being the coefficients of A(q*') and B(q'^). 

4. the coefficients of A(q'l), Bfq**) in (2.3) be held constant within the time 
intervals Ij,, with N = |Ij. | > fOn. 
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5 the external command sequence u(t) have at least 4n different sinusoidal 
components. 

6. the estimated sequence 0(t) be as in (2.13). 

Then under these conditions, lim 0 t = © exponentially. 

t — * 

The extension to bounded disturbances v(t) * 0 can be shown under the condition that 
|v(t)| is sufficiently small compared with the input and output signals. 
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III. APPROACH TO PARALLEL ALGORITHM 



A. IMPLEMENTATION OF THE PARALLEL ALGORITHM 

The real time identification algorithm with recursive least-squares, covariance 
resetting and block, processing in equation (2.13) and (2.14) is redesigned for parallel 
implementation. By definition, using the minimization of quadratic cost function in 
(2.12), the estimate 0(t) at time t minimizes the quadratic cost function 



V t,k( 0 ’ 0 kN ) “ 

. • <p(j) T e i 2 + (6- e kN )P kN .,->< e-e kN ) 



(3.1) 



with t s I k+ j (i.e., kN^t<(k+l)N) and P^-i"^ = ^ The * act t ^ iat c ^ e 

covariance matrix P t is reset to initial conditions at each t = kN-l, is equivalent to 
least-squares estimation in block I k + j with initial conditions 0 k ^. For t = (k+ 1)N-1 
the cost function V in (3.1) can be written as 



v k ( 0 ) = W k (0) T W k (0) 



(3.2) 



where 

w k (0) T = [ e k ^(0), ..., e^ k+ . i)x-l( 0 ) ’* ^o^ 0 ‘ 0 kN^ T J 



e t (0) = y(t) - <p(t) T 0 



By the above definitions we can write W k as 
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y((k + l)N-l) 



<p((k + l)N-l) T 





e 



(3.3) 



with B ^ 6 R( d + N ) x 0 € x A^ £ R'^ + ^ s ) x ^ where d is dimension of <p. 

The algorithm we analyze computes G^nj by minimizing the cost function in (3.1) 
based on a QR decomposition of the matrix Ay. in (3.3). The estimated sequence Gj,^- 
so obtained is therefore identical to the one obtained by recursive least-squares with 
covariance resetting in (2. 13), (2. 14) at t = kN. 

B. SOLUTION OF THE LEAST-SQUARES PROBLEM USING QR 
DECOMPOSITION 

A numerically attractive method of solving the least-squares error problem is the 
QR decomposition of a given matrix. By applying the QR decomposition method 
described in the introduction, at each time block k define Qj. and as the QR 
decomposition factors of Aj. as 

Q k A k = n > m (3.4) 

or equivalently 



A k “ Qk T 2k 



(3.5) 



with Q^ 6 R( d + ^) X ^ an orthogonal matrix so that 



Qk T = Qk' 1 

and R y 6 r(^ + x d partitioned as 
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5k - 



Ri 



O 



(3-6) 



R k e x d U pp er triangular matrix, and O the null matrix. Applying the same 
transformation Q k to g k in (3.3) partition the result as 



Qk 5k ~ 



Ski 

5 k 2 



(3.7) 



with eR^ x *. By definition of V k in (3.2) and orthogonality of Q k we can write 



V k (0) = W k T (6)Q k 1 Q k W k (0) 



T, 



and by in (3.3) - (3.7), 



-Q k w k (6) = 



R k 0 




Ski 


0 




Sk2 



where S k 2 is independent of 0. Therefore 

v k ,e> - I R k 9 - Ski I 2 + I s k2 ! 2 

In the above equation if the matrix R k is nonsingular then the cost function V k is 
minimized by 



0 (k+l)N 




5k 



(3.8) 



Notice that equation (3.8) does not require any explicit matrix inversion since R k 
is in upper triangular form. Therefore by the QR decomposition introduced above, we 
can compute 0^ k + using two processors Pl,P2 in cascade as in Figure 3.1. 

In Figure 3.1, processor PI computes R k< B k p while processor P2 computes 0 from 
(3.8). In the next section we see how use parallel computational structure to compute 
R k , B% | and solve equation (3.8). 
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Figure 3.1 Two Step Computation for Q^r. 

C. PARALLEL SOLUTION OF THE LEAST-SQUARES ALGORITHM 

This section is devoted to the solution of (3.S) using a parallel structure, and we 
introduce an algorithm suitable to the computation of the QR factorization of any 
given matrix of dimension n x m (n>m). Although several techniques exist in the 
literature, the Givens Rotation is prefered in parallel computing structures, since it 
successively manipulates two adjacent rows of the matrix at a time. For this purpose 
we define the Givens operator Q(p,q), which performs a rotation of 2 x 1 dimensional 
subvectors of the matrix A in order to annihilate the element a^p. In the applications 
to parallel processing, the rotation is determined based on adjacent elements of the first 
column, then it propagates to the successive right columns of the matrix. The Givens 
rotation provides matrix triangularization by affecting two rows at a time and the 
operations are limited to neighboring data in the matrix. Basis of the Givens rotation 
is the matrix 



Q(p,q) = 



0 

r(p.q) 

0 



0 

0 

I? 



(3.9) 
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associated to each pair of indexes p,q e [1 ,d 4- 1 with Ii and Io identity matrices of 

~ Ox? 

dimensions (q-2)*(q-2) and (d+ 1-q) * (d+ 1-q) respectively, and r e R“ of the 
form 



r(p.q) 



c(p,q) s(p,q) 

-s(p,q) c(p,q) 



(3.10) 



Application of the transformation Q(p,q) to any matrix A of appropriate dimensions, 
yields 



Q(p,q)A = 



xxx 



x 1 x 

x 0 x 



q 



X 



f 

p 



X 



(3.11) 



provided the terms c(p,q), s(p.q) in (3.10) are such that 



c(p.q)a q -l,p + s(p,q)a qp = 1 
c(p,q)a qp - s(p,q)a q . I p = 0 

which yields the unique solution of rotation parameters 



c(p.q) 

s(p,q) 



, a q-Up 



a n i 4 a “ 
q-l,p qp 

a p.q 

7~ r 

a n I „ + a." 
q-I,p qp 



(3.13) 



the notation x in (3.1 1) just indicates other terms of the matrix. 

Observation: Notice that at each time block k, the matrix A^ as in (3.3) has full 
rank provided the parameter cr 0 is nonzero. Therefore the least-squares solution of 
(3.3) is always unique for each block k, and as a consequence the diagonal elements of 
are always nonzero. 
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An example of application of the Givens Rotation in a three dimensional case is 
the following 



r ll • 


r 12 


r 13 




<Pl 


q>2 


(p 3 


0 


r 22 


r 23 




a ll 


a 12 


a 13 


0 


0 


r 33 


= Q(3,4)Q(2,3)Q(1,2) 


0 


a 22 


a 23 


0 


0 


0 




0 


0 


a 33 



Each transformation Q in (3.14) is performed in two steps: At first, we compute c and 
s defined in (3.13) and then we perform the linear combination of rows q and q-1 of 
the A matrix in order to obtain the upper triangular matrix Rj. in (3.14). By using this 
idea, we can implement the solution of (3.8) in a parallel structure. For each time 
block Ij, define the initial matrices 








(3.15) 



9^\j being the estimate from the previous block of data I^_j. Then Rj. in (3.8) can be 
determined as Rj. = Rj,' S " ^ where RjJ is recursively computed as 



<p T (kN+ 1) 

■R? T 

v(kN + j) 

Ik ?' 1 



= Qk j 


R k J 

0 


= Qk j 


Ski’ 

Sk; 1 



j = 0,1 N-l 



(3.16) 



These operations are performed by the network of cell processors shown in 
Figure 3.2. The processors work simultaneously at the same clock pulse in each cell 
and their operations will be described by the transition and output function at the next 
section. In Figure 3.2, the regression vectors (p(t) are at the top of the array, and the 
processors operate with the given data simultaneously. The details of the 
implementation are given in the next section. 
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Figure 3.2 Systolic Array. 

D. SYSTOLIC ARRAY IMPLEMENTATION FOR PARALLEL STRUCTURES 
1. General Description of Systolic Arrays 

In this section we show how the least-squares algorithm presented above can 
be implemented in a systolic array structure. In particular we aim at a computer 
structure which accepts as input the sequences of regression vectors tp(n) and signal 
y(n) and outputs least-squares estimates of the parameter 0 by block processing. The 
two processors described above (PI, P2. in Fig. 3.1) can be implemented as arrays of 
cells as shown in Figure 3.3, where the particular case of d = 4 is presented. In the 
block processing we had mentioned in Chapter II, the processor P!(triangular section) 
operates during the given block time, while P2(linear section) operates only starting at 
the "reset" signal at the end of each block, all data in the triangular section flow to the 
linear section. 

The entire systolic array is controlled by a single clock signal. Each array 
consists of two types of processing cells; internal cells (represented by squares) and 
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Figure 3.3 Systolic Array Implementation for Recursive Least-Squares Algorithm. 

boundary' cell (represented by circles). The specific arithmetic function of the cells will 
be defined later. The triangular systolic array section implements the Givens Rotation 
part of the recursive least-squares algorithm, whereas the linear systolic array section 
computes the least-squares weight vector at the end of the entire recursion without any 
matrix inversion. 
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2. Triangular Systolic Array 

The triangular systolic array performs the Givens Rotation described above. 
In particular the boundary cells compute the rotation parameters c and s as in 
Equation (3.12), while the internal cells compute the linear combination of adjacent 
rows. The efficiency of the systolic array comes from the fact that each cell operates 
continuously on the data. In fact once a term in the matrix is updated and the results 
are passed to the cells on the right and below, new data can be immediately fed and 
computed. The mutual relation among the computing stages of different cells at the 
same time t is shown in Figure 3.3 where the label on each cell (i,j) indicates the order 
in which the data enter the array. 




Figure 3.4 Computing Sequence of Triangular Array. 

The operation of the triangular systolic array are summarized in Figure 3.5 which 
shows the boundary and internal cells. Basically, the internal cells perform only 
additions and multiplications, the boundary cells are considerably more complex in the 
sense that they compute three multiplications, one addition and one division at each 
cycle. 
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Figure 3.5 Cell Definition for a Triangular Systolic for Performing Orthogonal Triangularizatioi 

Each cell of the triangular systolic array section stores a particular element of the 
upper triangular matrix R(n) in (3.6) and it is initialized to zero for internal cells and 
<r 0 for the boundary cells as mentioned in (3.15). The function of each row of 
processing cells in the triangular systolic array section is to combine one row of the 
stored triangular matrix with a vector of data received from the above cells in such a 
way that the leading element of the received data vector is annihilated. The data 
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vector so obtained is then passed downward on to the next row of cells. The boundary 
cells in each row of the section computes the pertinent rotation parameters and then 
passes them to the right on the next clock cycle, as shown in Figure 3.5. The internal 
cells subsequently apply the same rotation parameter to all other elements in the 
received data vector. 

Since a delay of one clock cycle per cell is incurred in passing the rotation 
parameters to the right along a row, it is necessary’ that the input data vectors enter the 
triangular systolic array in a skewed order, as illustrated in Figure 3.3. This 
arrangement of the input data ensures that each column vector <p(n) of the data matrix 
propagates through the array, it interacts with the previously stored triangular matrix 
R(n-l) and undergoes the sequence of Givens Rotation Q(n). 

At the same time as the orthogonal triangularization process being performed 
by the triangular systolic array section, the column vector S{ n) is computed by the 
rightmost coiumn of internal cells. In effect, this computation is-made by treating the 
desired response vector Y(n) as an additional column that is appended to the data 
matrix (p(n) at its right end. When the entire orthogonal triangularization process is 
complied, each particular row of the upper triangular d x d matrix R(k) and the 
associated d * 1 vector B(k) along the corresponding wavefront is clocked out for 
subsequent processing by the linear systolic array section at the last clock time of the 
block processing period 1^. 

3. Linear Systolic Array 

The linear systolic array section consists of one boundary cell and (d-1) 
internal cells that perform the arithmetic function defined in Figure 3.6 in accordance 
with equation (3.17). 



Zd-i< 0 > = 0 

Z M (n+l) = Z-(n) 4- r-j(k) 0j(k) (3.17) 

e L (k) = S x (k) - Z-(n) 



where i is from 1 to d-1, k is current NO. of block, n denotes the clock cycle in the 
linear section, Z^(n) are intermediate variables, and r-(k) are elements of upper 
triangular matrix R(k). S-( k) are element of the vector B( k) and 0-(k) are element of 
least-squares weight vector 0(k). This section of the processor computes 0(k) by using 
the method of backward substitution [Ref. 8: pp. 272-274]. 
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Figure 3.6 Cells for Linear Systolic array. 

The boundary' cell performs subtraction and division, whereas the internal cell 
perform additions and multiplications. The elements of the least-squares weight vector 
appear at the output of the boundary cell at different clock cycles, with Sj(k) leaving 
this cell first, followed by S-_j(k), and so on right up Hj(k). In effect, the element of 
the weight vector 0-( k) are read out backward, i.e., i = d-1, ..., 2, 1. 

4. Solving The Estimated Parameters 

Once the triangular matrix R(k) and the corresponding vector B( k) are 
computed, next task is to transfer the data to the second processor(the linear array) 
and solve for the estimated parameters. This operation is performed at the end of each 
time block(t = kN). In order to shift the entries of the matrix R(k) and vector fl(k) of 
the triangular array it is easy to see from the ceil processors in Figure 3.5, that values 
c = 0, s=-l of the rotation parameters cause the data in the cells to shift to the cell 
below. This operation can be successively propagated to all cells on the right. This 
operation is triggered by the "reset’' command at the end of each time block. The reset 
command is given as an impulse to the boundary' cells in the triangular systolic array. 
At the next of the reset command, the rotation parameters in the boundary cells are 
generated the values c= 1, s = 0 and pass them to rightward in order to shift down all 
stored datas in the triangular section without any computation in the internal cells. 
Let us consider the case where the dimension of <p(k) is 4(d = 4). After the "reset” 
command, the entries of 4 x 4 matrix R(k) and 4* 1 vector B{ k) are shifted down to 
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the linear array with time sequence as shown in Figure 3.4. All cells in the linear array 
are initialized tc zero. The variable Zj(n) arc forced leftward through the processor 
with accumulation inner product terms in the rest of the processor as it moves to the 
left in Equation (3.17) while r— and bj are lowered as indicated in Figure 3.7. The left 
end processor is dillerent in that it computes 0-(k) by function in equation 3.17. At 
any time bj reaches the boundary cell, it is combined with Zj(n) and the processor 
computes the weight vector 0(k). 




Figure 3.7 The Linearly connected Systolic Array. 
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We generate the element 0j(k), 0^_|j(k) and 0j(k) of the least-squares weight vector 
0(k), in that order, at the output of the boundary cell, and transferred the matrix S(k) 
as a part of the initial condition of the next block processing 1^ + j as shown in Figure 
3.8. [Ref. 10: pp. 514-515] 




Figure 3.8 Overall Computation Structure. 

In the case of SISO, the triangular systolic array operates during the interval 
1^. Another time period is required for computation of the parameter Qj(k) in the 
linear systolic array section. It turns out that the exact additional clock times for the 
linear systolic section(represented by Ij) is proportional to the dimension of 
<p(n)(Ij= 2d) and can be expressed as a O(d) which is showed in Figure 3.9. In fact the 
solution of the triangular system requires a number of d substitutions. 

To summarize, the total clock time required for computation of 0(k) is N + 2d. The 
operation involved at the individual clock cycles of the linear array section are 
described in Figure 3.10 for the case of d = 4. 
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Figure 3.9 Block Processing. 



From the above description we see that advantages and disadvantages are 
associated to this parallel estimation scheme. A clear advantage is the parallelism in 
computation, as opposed to the serial nature of recursive least-squares, which makes it 
attractive when a large number of parameters has to be estimated. A disadvantage is 
that this technique is effective only in conjunction with block processing. This is due 
to the fact that we always need the time to solve for the linear system in the linear 
section. Therefore the parameter estimates are updated only at the end of the block 
time, contrary to the serial case when updated estimates of the parameters are available 
at any time. 
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Figure 3.10 Operation of The Linear Systolic Array for d = 4. 
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IV. SIMULATION STUDY 



A. MODEL SET 

The linear model in general can be expressed by either the difference equation 
relating the input and output sequences, or in a regression form as in the previous 
Chapters. In the examples below we consider the problem of estimating the 
parameters of a third order model using the systolic technique described in the previous 
Chapter. Consider the class of models with difference equation 



y(t) + ajy(t-l) + a 2 >’(t- 2 ) + a^to) = bju(t-l) + v(t) 



(4.1) 



where u(t), v(t) are the input and output sequences and v(t) is an added disturbance. 
An alternative way to express (4.1) is obtained by defining the vectors 



0 1 — ( a j, a2* bj ) 

cp T (t) = [ -y(t-l), -y(t-2), -y(t-3), u(t-l) J 



(4.2) 



which leads to the regression model 



>’(t) = 0 T q>(t) + v(t) (4.3) 

This model describes the observed variable y(t) as linear combination of the 
components of the observed vector cp(t) plus noise. In the parameter estimation 
problem the values in 0 are assumed to be unknown. 

1. Noise Model 

The most realistic model has to include noise in system and measurement. In 
general, v(t) in equation (4.1) is a sequence of independent random variables with zero 
mean values, for an ideal "white noise" source. When the noise is "colored" (i.e., its 
samples are correlated) we need to encrease the order of the difference equation to 
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include the dynamics of the noise model. In this case a higher tnodei order is required 
which incorporate also noise dynamics [Ref. 3: p. 265]. From a physical point of view 
it is common to work with disturbances that are "measurement errors" or "output 
error" expressed in equation (4.4) 



Biq* 1 ^ 

y(t) = | )U(0 + v(t) (4.4) 



the difference between this model and equation 4.1 is also illustrated in Figure 4.1. 




Figure 4.1 Noise Model -(a) Equation Error (b) Output Error. 

Identification methods that use the model in equation (4.4) are often known as output 
error methods or model reference identification methods. In this research, we simulate 
two types of model using parallel identification algorithm. 

2. Choice of Initial Value in The Systolic Array 

For a recursive algorithm, we need to initialize the systolic array with an 
initial parameter estimate 0(0) and initial covariance matrix(<J 0 I in equation 2.14). The 
choice of these initial values will be discussed in this section. In equation 2.14, <r Q is 
related to the confidence we have on the initial condition 0(0). If some prior 
information about 0(0) is available it should of course be used for determining suitable 
values of 0(0). In particular as we have mentioned in Chapter 111 at the beginning of 
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the new block. Ii^ j we use the estimate obtained at the previous time block. If no 
prior information is available, the most common choice is to take 

6(0) = 0, p- ! (0) = (T q 2 I (4.5) 

where < 7 Q " is a large number for fast convergence. In this research we use different 
values of <? o (0) for comparison of the results. 

B. PARALLEL ALGORITHM SIMULATION 

The program which is included in this research report in the Appendix is used to 
investigate the parallel algorithm on the basis of the systolic array presented in the 
previous chapters. Consider a plant with discrete time transfer function 



b,z" 

H(z) = r- — (4-6) 

z“ + ajZ~ + a-iZ + a^ 

The plant has two zeros at z = 0 and three stable poles in the z-plane. The 
difference equation associated with the transfer function (4.6) can be expressed as 



y(t) + a j y(t- 1 ) + a 2 y(t-2) + a 3 y(t-3) = bju(t-l) (4.7) 

For identification purpose the input is "sufficient rich" in frequency in order to 
excite all modes of the system. This translates in the requirement that the input u(t) 
must contain a sufficient number of sinusoids. In particular u(t) should contain at 
least 4n different sinusoidal components as mentioned in Chapter II. Writing (4.7) in 
regression form we obtain 



y(t) = (p T 0 + v(t) 



where 

<P T (t) = [ y(t-l), y(t-2), y(t-3), u(t-f) ] 

and 



(4.S) 
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0 = 



a l 

a 2 

a 3 

b l 



0 being the vector of the actual parameters of the plant. 
Example 4.1) 



Consider the discrete time system with transfer function. 
1.3 z~ 

H(z) = r 

( z - 0.5 

linear difference equation 



y(t) - 1.5v(t-l) + 0.75y(t-2) - 0.125y(t-3) = l.3u(t-l) (4.9) 

and input sequence 

u(t-l) = cos(nJt/10) + cos(ntt/5) + cos(n7T/2) 4- cos(3nrc/5). 

We applied the parallel identification algorithm using <? Q = 1, and 100, no disturbance 
present and N = lOn as the block length for the triangular systolic array section. The 
corresponding plots are given in Figures 4.2 and 4.3. We can see that in ideal 
conditions all estimated parameters converge to their correct values. The estimated 
weight vector plotted versus time for number of block (represented by NB). 

1. Choice of Optimal Block Length 

In the previous chapters, we set the time block length to N ^ lOn. The 
reason of this has been highlighted in Chapter I, and it gives a sufficient condition for 
global stability in adaptive control. In the simple case of parameter estimation we do 
not have to maintain this restriction. An important factor is to achieve fast 
convergence, whenever possible and to find an optimal block length for fast 
convergence. The influence of the block length on the convergence rate will now be 
illustrated by means of simulation in example 4.2. 
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Example 4.2)( choice of optimal block length ) 

For the illustration of optimal length, we have used the same system shown in 
equation (4.9) with different block lengths. It has been simulated with 3 different block 
lengths without noise and with noise condition from equation error ( S/N = 10 ) in 
order to test the convergence rates in the different cases. So, we have used the same 
amount of clock cycles for the comparison of each results using different block lengths. 
It means that the scale of the X-axis has exactly the same amount of clock cycles. The 
numerical computation for the required number of blocks are summarized in Table 1 in 
order to expressed the number of time blocks in each simulation. 

TABLE 1 

REQUIRED BLOCK NO. 



Block Length 
N ~ 


Entire Block Length 
Per One Block (NT- 2d) 


Total Clock 
Cycles 


Required 
Block NO. 


5n 


15 + 8 = 23 


5704 


248 


lOn 


30 + 8 = 38 


5700 


150 


I5n 


45 8 = 5a 


5724 


108 


20n 


60 +, 8 = 68 


5712 


S4 



The results are shown in Figures 4.4 through 4.6 without noise condition, and 
the corresponding outputs with noise condition are in Figures 4.7 through 4.9. 

It turns out that the estimated weight vector show's the fastest convergence 
rate w'hen used with N = 15n. Even thought the estimated parameter a^ converges 
very fast at N = 5n, other estimated parameters exhibit slow convergence rate 
compare to the case of N = 15n, regardless the noise condition. Therefore the optimal 
choice of block length must be the case when N = 15n, and there is one more 
interesting comment in these results. We realized that within certain limits, the longer 
the block length, the faster the convergence rate, but when we extended the length X 
= 20n, the convergence rate of that result is similar to the output of N = lOn, and for 
the more accurate comparison w'e can look at the Figure 4.3 (N=l0n) and 4.7 
(N=20n). As it w r ere. the conclusion with this example is that the convergence rate to 
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I-i^urc 4.4 Example 4.2 - Parameter Estimation with N = 5n, W/O Noise. 
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1'iuurc 4.5 Example 4.2 - Parameter Estimation with N = 15n, W/O Noise. 
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1- iy me 4.7 Example 4.2 - Parameter Estimation with N = 5n, S/N 
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l it>ure 48 l:\ample 42 - Parameter Estimation with N = I5n, S/N 
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be increased according to increasing of the block length, but if the block length exceed 
certain saddle point then the convergence rate starts to decrease. This optimal block 
length (N = 15n; is applied for the remaining simulation studies. 

2. Application to Systems with Varying Parameters 

A desirable application of adaptive identification is the tracking of time 
varying parameters. Then there is no obviously tradeoff between tracking ability and 
noise sensitivity [Ref. 3: p. 274] and it will be impossible to exactly follow parameters 
with a high rate of change. However, a slow time variation of the actual parameters 
can be often be tracked reasonably well. To illustrate this approaches consider the 
example below. 

Example 4.3) ( tracking of parameters with time-varying ) 

Consider the system with model as in equation (4.11) and time varying parameters. 
We simulated the input-output behaviour of the system for a number of block 
(represented by NB) = 150 using the parameter values 

aj = 1.8 

a 2 = -1.08 if 30 < NB < 60, 90 < NB < 120 

a 3 = 0.216 
bj = 1.0 

aj = 1.5 

a-> = -0.75 otherwise 

a 3 = 0.125 
bj =' 1.3 

The parameter estimates obtained with this example are displayed in Figure 4.10. 

From the result, we can show that all estimated parameters converged very' 
well to their expected values. 

Example 4.4) ( tracking of parameters with noise ) 

We take the two different types of model set for simulation in this example. 
For the case of equation error, the system is 



y(t) = a jy(t-l) -l- a 2 y(t-2) + a 3 y(t-3) = bju(t-l) + v(t) 



(4.10) 
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l igurc -1.10 hxample T3 - Tracking of Parameter with Time Varying, W/O Noise. 



The other case for output error is 



(4.11) 

y(t) + ajy(t-l) + a2y(t-2) + a 3 y(t-3) = bju(t-l) + v(t) + ajv(t-l) + a2v(t-2) + a-v(t-3) 

where v(t) is white noise of zero mean for both systems. In this simulation, all 
parameters gain are used in equation (4.9) and tested with different input-signal to 
noise ratio, i.e., S/N = 10 and S/N = 2. The corresponding results are in Figures 4.11 
and 4.12 for using equation error and Figures 4.13. 4.14 are from output error model. 

In these results, the outputs of equation error exhibit less variations the results 
from output error. The comparison of the two model sets have not been carried out 
because there is no general result that clearly factors either model set. The amplitude 
of variation always depends on the system itself. 

Final simulation is extended of Example 4.3 with tw r o different types of noise 
model (S/N = 10) for the verification of parallel algorithm and illustrated in Example 
J.5. 



Example 4.5)( tracking with time-varying parameters and noise ) 

We simulated the combination of two system shown in Example 4.3 and 4.4. 

The systems are given by 

y(t) + ajv(t-l) 4- a 2 y(t-2) + a-y( t-3) = bju(t-l) +v(t) 
for equation error, 
and 

y(t) 4- a j y( t- 1 ) + a->y(t-2) 4- a 3 y(t-3) = b^u(t-l) 4- v(t) 4- a | v( t- 1 ) 4- a->v(t-2) 4- a 3 v(t-3) 
for the output error case. The parameters above change as follows 
a | = 1.8 

a 2 = -1.08 30 < NB < 60. 90 < NB < 120 

a 3 = 0.216 

bj = 1.0 
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Figure <4.1 1 Example 4.4 - Tracking of Parameter with Equation Error, S/N 
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figure *4. 1 3 Hxample 4 *4 - Tracking of Parameter with Output lit t or, S/N 




58 



I igure 4.1*4 Lxample 4 4 - I racking of Parameter with Output Hrror, S/N 



otherwise 



aj= 1.5 
a 2 = -0.75 
a 3 = 0.125 
b, = 1.3 

The results are given in Figures in 4.15 and 4.16. 

All parameters are converged very well under time-varying and noise 
condition. 

in the discussion so far, we simulated several possible different conditions to 
investigate the behaviour of the parallel algorithm. The simulation study shows that 
we can obtain a reasonable parameter tracking in the presence of noisy measurements. 
The time block approach averages out the effects of random noise and provides an 
adjustable flexibility to improve the convergence rate of the parameter estimates. 
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Figure 4.15 Example 4.5 - Parameter Tracking with Time-Varying, S/N = 10 of Equation Error. 
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V. PERFORMANCE COMPARISON AND CONCLUSION 



A. COMPUTATIONAL PROBLEM OF SERIAL RECURSIVE LEAST- 
SQUARES 

The estimation of the model parameters using a parallel structure and standard 
least-squares algorithm has been described in Chapters II and III. We have discussed 
two possible ways of implementing the estimation algorithm: serial as in equation 
(2.13) and parallel as described in Chapter III. In this section we want to summarize 
the various trade-oils of the two algorithms, on the basis of their complexity in terms 
of computational power required. In particular consider the number of algebraic 
operations required for one iteration as a comparison of the two algorithms. For the 
case of serial implementation consider the two recursions in equation (2.13) which both 
update the covariance matrix as well as the parameter estimates. From a numerical 
standpoint the recursion of the covariance matrix P(t) exhibits the highest complexity, 
since it requires a number of operations which encreases with d 2 , d x d being the 
dimensions of P(t). Also it is worth mentioning that P(t) is not updated by direct 
application of the recursion in (2.13) but it is rather computed by LUD factorization 
[Ref. 3: p. 336] in order to preserve its positive definiteness in the presence of numerical 
errors. In this way it is possible to compute the number of arithmetic operations 
required for matrix inversion and parameter estimation. Table 2 shows the total 
number of arithmetic operations for updating the parameters and also can be used to 
determine the time required per iteration. 

TABLE 2 

NUMBER OF ARITHMETIC OPERATION FOR UPDATING INPUT 





Matrix Inv. 


Parameter Est. 


Total 


Addition 


1.5d 2 + 1.5d 


d 2 + 2d 


2.5d 2 -I- 3.5d 


Multiplication 


1.5d 2 -t-5.5d 


d 2 + 2d 


2.5d 2 + 7.5d 


Division 


d 


d 


2d 


Subtraction 




1 


1 
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As shown in Table 2, for example, let us consider a 3rd order autoregressive model 
with three poles as in example 4.1. In this case the dimension of ^(represented by d) is 
4. and a total 54 additions, 70 multiplications, 8 divisions and 1 subtraction are 
required for each parameter updating It is clear that the sampling time, T g encreases 
with the square of <p(d) in the standard recursive least-squares. As a consequence of 
this we can see that the rate at which we can enter the data <p(t) into the algorithm is 
proportional to 1/d 

B. COMPUTATION OF PARALLEL STRUCTURE 

In the case of parallel structure, the computations required by the limitation in 
the maximum sampling frequency are performed in each cell at each clock cycle. As 
discussed in the previous chapters the estimation is performed by two distinct 
processors: PI which performs matrix triangularization and operates within each time 
block, and P2 which solves the triangular system and it operates at the end of the time 
block. Each of these two processors has different complexities as shown below. The 
number of operations in each cell of the triangularization processor PI is shown in 
Table 3. Since the cells operate in parallel, and in a pipeline fashion, at the end of each 
computing cycle we can enter new data, so the throughput(i.e., the rate at which we 
can feed new data into the processor) is constant and independent of the complexity of 
the system. We can say therefore that in this case the sampling time is of order T = 
0(1), constant, compared with Tp = 0(d*") in the recursive implementation. 

TABLE 3 

NUMBER OF ARITHMETIC OPERATION IN PARALLEL STRUCTURE 



Section 


Triangular Systolic 


Linear Systolic 


Cell 


Boundary 


Internal 


Boundary 


Internal 


Addition 


1 


2 




1 


Multiplication 


3 


4 




1 


Division 


1 








Subtraction 






1 
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On the other hand, the senai implementation, the estimated parameters are 
available at any instant of time. This is not the case for the parallel algorithm 
presented since it has to wait until the end of the block to solve for the linear system of 
equation. Let us consider the time interval for parameters updating between both 
cases of the serial computation and parallel structure. This time is related to the 
amount of serial computation required by the tu r o algorithm. 

For example, let us consider the particular case shown Table 2 for serial 
computation, and assume the time block to be of length N = lOn (n = number of 
order) in the parallel structure. Although w T e cannot say a priori which cell takes 
longer computing time, by refering to Table 3 we can conjecture that the internal and 
the boundary cells of triangular systolic array perform more complex operations than 
the linear array section. In Table 4, w'e compute the number of arithmetic operations 
to performed for each clock cycle in the serial computation and one entire time block 
(N+2d) in the parallel structure shown in Figure 3.9. The former one based on either 
the number of operations in internal cells or boundary cells of the triangular systolic 
array. 



TABLE 4 

COMPARISON OF SERIAL AND PARALLEL STRUCTURE (DIM.<D= 4) 





Serial Computation 


Parallel Computation 


Basis 




Boundary 


Internal 


Addition 


54 


38 


76 


Multiplication 


70 


104 


152 


Division 


S 


38 




Subtraction 


1 







As shown in Table 4, it is clear that the number of required arithmetic operations 
in the serial computation per clock cycle is approximately half the number of 
operations needed for the entire time block (N + 2d) in the parallel structure. Since the 
number of operations is related with the time interval between parameters updating, we 
can have an idea of the rate at which we update the parameters. However we must 
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keep in mind that the serial algorithm updates the parameters on the basis of one 
sample only, while the parallel algorithm uses all samples within the time block, the 
benefit of this is a better performance of the parallel algorithm in case of noisy signals. 
This is in line with the fact discussed earlier that the sampling rate for the parallel 
algorithm can be much faster than the serial one, thus allowing more data to be 
processed in a given amount of time. 



65 



APPENDIX 

COMPUTER PROGRAM OF PARALLEL ALGORITHM 

PURPOSE 

* * * * * * * * * * * * * * * # * * * * * * * 



This program calculated the estimated weight vector of least-squares using 
parallel algorithm, the input is a coefficient of polynomial ; the form 



H(z) = 



BiZ 



-1 



1 + Ajz' 



+ A->z"~ + A~z° 



where the all coefficients are given interactively and real. This program is limited with 
system model as shown above transfer function, same type of denominator with 1st, 
2nd. and 3rd order cases and one input parameter. If run this program, should be 
extended virtual storage capacity to 2M in advance due to the large array. This 
program consists of main and four different subroutines which are performed each 
cell's function. 



VARIABLE DEFINITION 

REAL A(3 00, 5,200) ,U(300.5,300) , KIN (5,5), CINP (5,5), SINP (5,5) 

REAL COUP (5.5), SOUP (5,5) , SOUN( 5 , 5) , CINN( 5 , 5 ) , SINN( 5 , 5 ) ,XOUT( 5 , 5 ) 
REAL SX(5,5) ,A1,B1 , YNP , YP , INUP , SIG , PI , C0UN( 5 , 5 ) ,EA1 ,EB1 ,NSX(5 , 5) 
REAL A 2 , YPM1 , ABB , EA2 , YPN , XA1 , XB1 , PI 10 , ZIN (5,5), ZOUT (5,5) 

REAL WK( 5 , 5 ) , RE3( 12 ) , AM, V,S ,DIN(6 , 6. 15) ,DOUT(7 , 7 , 15) 

REAL XLIN(7 ,7,15) ,WKIN(5,5) ,WK1(5,5) ,XIB 

INTEGER K, L ,N ,NT , IK, IL ,NBN, XI ,L1,N,IT,NI, IX,NM1 ,NB , K2 , L2 , KR ,LR ,NC 



******** INITIALIZE THE GIVEN TRANSFER FUNCTION AND ALL VARIABLES **** 



,N 



? 1 



1000 PRINT *, 'ENTER THE SIZE OF MATRIX N BY N ** N ? 1 
READ * N 
PRINT *, 1 N = 1 
PRINT *, 1 1 

PRINT *, 'ENTER THE ORDERS OF DEN. OF DIFF. EQ.**N0 ?' 

READ *,NO 

PRINT *, 'NO = 1 /NO 
PRINT *, ' ' 

PRINT *, 'HOW MANY BLOCK DO YOU WANT TO ITERATE **N3N 
READ * NBN 
PRINT *, 'NBN = ' ,NBN 
IF ( NO ,EQ . 1) THEN 

PRINT *, 'FORMAT OF 1ST ORDER DIFF.EQ.***Y(Z) = B1*U(Z-1) + A1*Y(Z- 
* 1 ) 1 

PRINT *, 'ENTER THE COEFF . OF A1 ?' 

READ * A1 

PRINT * ' A1 = ' A1 

PRINT 'ENTER THE COEFF. OF B1 ?' • 
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***** 



READ *31 

PRINT *, '31 = ' , £1 
WRITE (6 , 7C0) 

ELSEIF ( NO .EQ. 2) THEN 
PRINT *,' FORMAT OF 2ND ORDER 
*1)- A2*Y (Z-2 ) 1 
PRINT *, ’ENTER THE COEFF. OF 
READ * A1 

PRINT *, 'A1 = ' ,A1 

PRINT *,' ENTER THE COEFF. OF 

READ * A2 

PRINT ”, 'A2 = 1 ,A2 

PRINT * , 1 ENTER THE COEFF . OF 

READ * B1 

PRINT *, '31 = ' ,31 
WRITE (6,720) 

ELSEIF ( NO .EQ. 3) THEN 
PRINT * , 1 FORMAT OF 3RD ORDER 
*1)- A2*Y(Z-2) + A3*Y(Z-3) 1 
PRINT *, ‘ENTER THE COEFF. OF 
READ * A1 

PRINT *, 1 A1 = ' ,A1 

PRINT ENTER THE COEFF. OF 

READ * A2 

PRINT *, ' A2 = ' ,A2 

PRINT *, 'ENTER THE COEFF. OF 

READ * A3 

PRINT *, 'A3 = 1 ,A3 

PRINT *, 'ENTER THE COEFF. OF 

READ *31 

PRINT ”, '31 = ' ,B1 
WRITE (6,740) 

ELSE 

PRINT *, 'ERROR ** PLEASE TRY 

GOTO 1000 

END IF 

NM1=N-1 

NB =15*NO 



DIFF . EQ . ***Y(Z) = B1*U(Z-1) + A1*Y(Z 
A1 ?' 

A2 ?' 

B1 ?' 

DIFF.EQ.***Y(Z) = B1*U(Z-1) + A1*Y(Z 
A1 ?' 

A2 ?' 

A3 ?' 

B1 ?' 

AGAIN AMONG THOSE 1,2,3 FOR NO' 



SIG = 1. 

PI = 3.141592 
PI10 = PI/10 
IX = 1 
S = 0.07071 
AM = 0. 

YP = 0. 

YPM = 0. 

YPM1 = 0. 

VM1 = 0 
VM2 = 0 
VM3 = 0 

INUP = COS(PIIO) + COS (PI10*2 ) + COS(PI10*5)+ COS(PI10*6) 



MAIN PROGRAM BEGIN ******** 



DO 10 IK = 1 , N 
DO 20 IL = 1,N 

IF ((IK .EQ. IL ) .AND. (IK .LT. N ) ) THEN 
COUP( IK, IK)=1 . 

SOUP ( IK, IK)=0 . 

SX(IK, IL) = SIG 
ELSE 

SX( IK, IL) = 0. 

END IF 

IF (IK .EQ. N) THEN 
ZIN ( IK , IL) = 0 
WK( IK , IL) = 0 
END IF 

IF (K .NE. 1) THEN 
XIN(lK,IL)=0. 
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on o ooonnooonono 



END IF 

20 CONTINUE 
10 CONTINUE 



DO 330 IB = 1 ,NBN 
DO 30 NT = l.,NB 

* ****** ?xmE VARYING INPUT ******* 



IF ( ( (IB.GT.30) .AND. (IB. LT. 60)) .OR. ((IB. GT. 90) .AND. (IB. LT. 120))) 
* THEN 

A1 = 1.3 
A2 = 1.08 
A3 = 0.216 
B1 = 1.0 
ELSE 

A1 = 1.5 
A2 = 0.75 
A3 = 0.125 
B1 = 1.3 
END IF 

IF (NT .LE. NB-N+1) THEN 
CALL GAUSS ( IX, S, AM, V) 

IF (NO .EQ. 1 ) THEN 
YNP = B11*INUP - A11*YP 
YNP = B1*INUP + A1*(YP+V) 

JF = 1 

A ( NT , JF , NT ) = YP 

A(NT, JF+1 ,NT) = INUP 
A(NT , JF+2 , NT) = YNP 
ELSEIF ( MO .EQ. 2) THEN 
YNP = B1*INUP + A1*(YP+V)- A2*YPM 
YNP = B1*(INUP+V) - A1*YP 
YNP = B1*INUP + A1*YP-A£*YPM 
JF = 1 

A(NT, JF,NT) = YPM 
A(NT , JF+1 ,NT) = YP 
A(NT , JF+2 ,NT ) = INUP 
A (NT, JF+3 , NT) = YNP 
ELSEIF ( NO .EQ. 3) THEN 
C YNP = B1*INU? + A1*YP- A2*YPM+A3*YPN1+V+A1*VM1+A2*VM2+A3*VM3 
YNP = B1*INUP +A1*YP- A2*YPN + A3*YPM1 +V 
C YNP = B1*INUP + A1*YP-A2*YPM+ A3*YPM1 
JF = 1 

A(NT, JF,NT) = YPM1 

A (NT, JF+1, NT) = YPM 
A(NT, JF+2, NT) = YP 
A (NT, JF+3, NT) = INUP 
A (NT, JF+4,NT) = YNP 
END IF 

******* TRANSFORMATION OF INPUT DATA WITH SKEW ORDER ***** 

DO 40 J = 1,N 
IT = J-l 
NI = NT+IT 

U(NI,J,NI) = A(NT, J,NT) 

40 CONTINUE 
END IF 

****** CALCULATE THE ORTHOGONAL TRIANGULAR MATRIX BY GIVENS 
rotation AND INITIALIZE THE ORTHOGONAL MATRIX ******* 

******* CALCULATE THE BOUNDARY AND INTERNAL CELLS ******* 



DO 50 K = 1 ,N-1 

DO 60 L = 1,N 

IF ( NT .LT. N ) THEN 
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CINP(K,L+1) = COUP ( K , L ) 
SINP(K.L+1) = SOUP(K,L) 
SX(X,L) = SX(K,L) 

END IF 



IF (t'K.LE.L) .AND. (K+L .LE. 
IF (K .E Q. 1) THEN 

x:n(k,l) = u(nt,l,nt) 

END IF 



NT+1 ) )THEN 



******* CALCULATE THE BOUNDARY CELL ****** 



IF ( K .EO. L) THEN 

CALL BND f K , L , NT , XIN , SX , COUP , SOUP ) 

CINN(K,L+1) = COUP(K.L) 

SINN(K,L+1 ) = SOUP(K,L) 

IF (NT .EQ. NB) THEN 
CINN(K,L+1) = 0. 

SINN(K,L+1) = -1. 

END IF 

******* CALCULATE THE INTERNAL CELL **** 



ELSEIF(K .NE. L) THEN 

CALL INRN (K , L ,NT , XIN , XOUT , SX ,NSX , CINP , SINP ) 

CINN(K,L+1) = CINP(K,L) 

SINN(K,L+1 ) = SINP(K,L) 

SX(K,L) = NSX(K,L) 

END IF 

ELSEIF ( K .GT. L ) THEN 
SX(K,L)=0. 

END IF 

60 CONTINUE 
50 CONTINUE 

DO 70 K1 = 1 ,N-1 
DO 30 LI = 1,N 
IF ( K1 .NE. LI) THEN 
CINP (K1 , LI ) = CINN(K1 , LI ) 

SINP(K1 ,L1) = SINN(K1 ,L1 ) 

END IF 

XIN(K1+1 , LI ) = X0UT(K1 , LI ) 

80 CONTINUE 
70 CONTINUE 

IF ( NO .EQ. 1) THEN 
YP = YNP 

ELSEIF ( NO .EQ. 2) THEN 
YPM = YP 
YP = YNP 

ELSEIF ( NO .EQ. 3) THEN 

YPN1 = YPM 

YPM = YP 

YP = YNP 

VM3 = VM2 

VM2 = VM1 

VM1 = V 

END IF 

NX = NT 

INUP = COS (PI10*NX)+COS (PI 10*NX*2 )+COS (PI10*NX*5 )+COS (PI10*6*NX) 
****** CALCULATE THE ESTIMATED COEFFICIENT WITH RESET ******* 

NC = 1 

IF (NT .EQ. NB ) THEN 
DO 600 IM = 1 , N 
DO 610 JM = 1 , N 
XOUT ( IM , JM) = 0. 

XIN ( IM , JM) = 0. 

610 CONTINUE 
600 CONTINUE 

DO 400 NR = 1 , 2*N-1 
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DO 410 KR = 1,N 
DO 420 LR = 1,N 

IF ( (KR .ME . LR) .AND. (KR .LT. N) ) THEN 

CALL INRN (KR.LR.NR.XIN.XOUT.SX.NSX.CINP.SIN?) 

txx*x**** **************************** *x*x****rt7t* 

IF (LR .NE. N ) THEN 

din;kr,lr,nr) = xout(kr,lr) 

END IF 

CINN(KR,LR+1) = CINP(KR,LR) 

SINN(KR,LR+1) = SINP(KR,LR) 

SX(KR , LR) = NSX(KR.LR) 

ELSEIF ((KR .EQ. LR ) . AND . (KR .LT. N) ) THEN 
XIN(KR,LR) = 0. 

CALL 3ND ( KR ( LR , NR . XIN , SX t COUF . SOUP ) 

CINN ( KR , LR+1 ) = COUP(KR,LR) 

SINN(KR,LR+1 ) = SOUP(KR,LR) 

END IF 

IF(KR .EQ. N) THEN 
IF (LR .EQ. N) THEN 

CALL LINBD (N , KR, LR . NR , ZIN , XIN , WK , WK1 . SX ,REB ,NC) 

IF ((NR.EQ. 3) .OR. (NR. EQ. 5) .OR. (NR .EQ. 7). OR. (NR .EQ.9)) THEN 
END IF 

ELSEIF ((LR .GT. 1) .AND. (LR .LT. N) ) THEN 



CALL LINNT ( KR , LR f NR , ZIN . XLIN . WK t WKIN . ZOUT ) 

XXXXXXXXXXXXXXXXXXXXXX^XXXXXXXXXXXXXXXXXXX 

END IF 
END IF 

420 CONTINUE 
410 CONTINUE 

DO 430 K2 = 1,N 
DO 440 L2 = 1,N 

IF (( K2 .ME. L2 ) .AND. (K2 .LT. N) ) THEN 
CINP ( K2 , L2 ) = CINN(K2 ,L2) 

SINP (K2 , L2 ) = SINN(K2,L2) 

END IF 

IF (L2 .LT. N-l ) THEN 
XIN(K2+1 ,L2+1) = DOUT(K2 , L2 ,NR) 

DOUT(K2 , L2 ,NR+1 ) = DIN(K2 ,L2 ,NR) 

ELSEIF (L2 .EO. N-l) THEN 
XLIN(N,K2+1 ,NR+2) = DIN(K2 ,L2 ,NR) 

ELSEIF (L2 .EO. N) THEN 
XIN(K2+1 ,L2) = XOUT(K2,L2) 

END IF 

IF (K2 .EQ. N) THEN 
ZIN(K2 , L2+1) = ZOUT(K2 ,L2) 

IF (L2 .EQ. N) THEN 
WKIN(K2 ,L2-1) = WK1 (K2 ,L2) 

ELSE 

WKIN(K2,L2-1) = WK(K2 ,L2 ) 

END IF 
END IF 

440 CONTINUE 
430 CONTINUE 
400 CONTINUE 



******* REINITIALIZE FOR NEW BLOCK PROCESSING ***** 



DO 470 N3 = 1 , 3*N-3 
DO 450 K3 = 1 ,N 
DO 460 L3 = 1 ,N 
IF (K3 .NE. 1 ) THEN 
XIN(K3 ,L3) = 0. 

END IF 
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XLIN(K3,L3,N3) = 0. 

DIN(K3,L3 ,N3) = 0. 

DOUT(K3 , L3 ,N3) = 0. 

IF ( X3 .LT. N) THEN 
CIN?(K3 ,L3) = 1. 

SINP(K3,L3) = 0. 

IF ( K2 .EO. L3 ) THEN 
SX(K3,L3) = 3IG 
ELSE 

SX(K3 , L3) = 0. 

END IF 

ELSEIF ( K3 .EQ. N ) THEN 
WK(K3 ,L3) = 0. 

ZIN(K3 , L3 ) = 0. 

END IF 

460 CONTINUE 
450 CONTINUE 
470 CONTINUE 
XIB = IB 

IF ( NO .EQ. 1) THEN 

WRITE (6,710) IB , REB ( 5 ) , REB ( 3 ) 

SX(1,3) = REB(5)*SIG 

SX(2,3) = REB(3)*SIG 

ELSEIF ( NO .EQ. 2) THEN 

WRITE (6,730) IB.REB(5) ,REB(7) ,REB(3) 

SX(1 ,4) = REB (7 ) *SIG 
SX(2 ,4 ) = REB ( 5 ) *SIG 
SX(3,4) = REB(3)*SIG 
ELSEIF ( NO .EQ. 3) THEN 

WRITE (6,750) XIB , REB( 5 ) , REB(7 ) , REB (9) , REB(3 ) 

SX( 1 , 5 ) = REB(9)*SIG 
SX( 2 , 5 ) = REB(7)*SIG 
SX(3 , 5 ) = REB(5)*SIG 
SX(4,5) = REB(3)*SIG 
END IF 
END IF 

30 CONTINUE 
330 CONTINUE 
STOP 

700 FORMAT ( 1 1 1 , 3X, ' BLOCK NO .', 6X, 1 COEFF . A1 1 , 9X, 1 COEFF . B1 ' ) 

710 FORMAT (3X, 14 , 11X, 2 (F10 .7 ,7X) ) 

720 FORMAT( ‘1' ,3X, 'BLOCK NO. 1 ,6X, 'COEFF. A1 1 , 9X, ' COEFF . A2 1 , 9X ,' COEFF 
*. B1 ’ ) 

730 FORMAT (3X, 14 , 12X, 3 (F10 .7 , 7X) ) 

740 FORMAT (' 1 ', 3X ,' BLOCK NO .', 6X ,' COEFF . A1 ', 9X ,' COEFF . A2’,9X,'COEF 
*F. A3' ,9X, 'COEFF. B1 ' ) 

750 FORMAT (3X, F4 . 0 , 12X, 4(F10 . 7 , 7X) ) 

END 



********** SUBROUTINE GROUP FOR CELL'S FUNCTION ********* 
************************************** 



SUBROUTINE BND (K . L , NT . XIN . SX , COUP . SOUP ) 

********* ******7t*7Z *7***7*** *7t* *7r****-k****-k 

REAL XIN (5, 5), SX (5, 5), COUP (5, 5), SOUP (5,5) 

REAL NSQ 

INTEGER K , L , NT 

IF ( XIN(K,L) .EQ. 0) THEN 

COUP(K,L)=l . 

SOUP(K,L)=0. 

SX(K, L) = SX(K,L) 

ELSE 

NSO = SX(K,L)**2+XIN(K,L)**2 
COl)P(K,L)=SX(K,L)/NSO 

soup(k,l / =xin(k,l)/nsq 

SX(K,L)=1. 

END IF 
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RETURN 

END 



SUBROUTINE INRN (K.L,NT.XIN.XOUT,SX,NSX,CINP ( SINP) 
************** *********«***********:**** *********** 

REAL XIN (5 , 5) , SX(5, 5) , CINP(5 , 5) , SINP(5 , 5) ,XOUT(5,5) ,NSX(5,5) 
INTEGER K , L , NT 

XOUT(K,L)=-SINP (K,L)*SX(K .L)+CINP(K .L)*XIN(K.L) 
NSX(K,L)=CINP(K,L)*SX(K,L)+SINP(K,L)*XIN(K,L) 

RETURN 

END 



SUBROUTINE L INNT (KR , LR , NR , ZIN , XLIN . WK . WKIN , ZOUT ) 

REAL ZIN(5,5),XLIN(7,7,15),WK(5,5) , ZOUT (5 , 5 ) ,WKIN(5,5) 
INTEGER KR,LR,NR 

ZOUT (KR , LR) = ZIN(KR.LR) + XLIN ( KR , LR , NR ) *WKIN (KR , LR ) 
WK ( KR , LR ) = WKIN(KR,LR) 

RETURN 

END 



SUBROUTINE . LINBD (N , KR . LR , NR . ZIN , XIN . WK . WK1 . SX . REB . NC ) 

REAL ZIN (5,5), XIN ( 5 , 5 ) , WK ( 5 , 5 ) , WK1 (5 , 5) , SX(5 , 5 ) , REB (5) 
INTEGER KR,LR,NR,N,NC 
WK(KR,LR) = SX(KR,LR) 

WK1 (KR,LR) = XIN(KR,LR) - ZIN(KR,LR) 

SX(KR,LR) = WK1 (KR , LR) 

REB (NR) = WK( KR,LR) 

RETURN 

END 
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